key-value store
Towards Efficient Key-Value Cache Management for Prefix Prefilling in LLM Inference
Zhu, Yue, Yu, Hao, Wang, Chen, Liu, Zhuoran, Lee, Eun Kyung
--The increasing adoption of large language models (LLMs) with extended context windows necessitates efficient Key-V alue Cache (KVC) management to optimize inference performance. We analyze real-world KVC access patterns using publicly available traces and evaluate commercial key-value stores like Redis and state-of-the-art RDMA-based systems (CHIME [1] and Sherman [2]) for KVC metadata management. Our work demonstrates the lack of tailored storage solution for KVC prefilling, underscores the need for an efficient distributed caching system with optimized metadata management for LLM workloads, and provides insights into designing improved KVC management systems for scalable, low-latency inference. Large Language Models (LLMs) have shown remarkable ability in tasks like text generation, translation, and question-answering, but their attention architecture introduces significant challenges. The use of key-value caches (KVC) in attention layer of transformer models, while essential for efficient token generation, requires substantial memory resources.
- North America > United States (0.04)
- Asia > Mongolia (0.04)
Learning to Optimize LSM-trees: Towards A Reinforcement Learning based Key-Value Store for Dynamic Workloads
Mo, Dingheng, Chen, Fanchao, Luo, Siqiang, Shan, Caihua
LSM-trees are widely adopted as the storage backend of key-value stores. However, optimizing the system performance under dynamic workloads has not been sufficiently studied or evaluated in previous work. To fill the gap, we present RusKey, a key-value store with the following new features: (1) RusKey is a first attempt to orchestrate LSM-tree structures online to enable robust performance under the context of dynamic workloads; (2) RusKey is the first study to use Reinforcement Learning (RL) to guide LSM-tree transformations; (3) RusKey includes a new LSM-tree design, named FLSM-tree, for an efficient transition between different compaction policies -- the bottleneck of dynamic key-value stores. We justify the superiority of the new design with theoretical analysis; (4) RusKey requires no prior workload knowledge for system adjustment, in contrast to state-of-the-art techniques. Experiments show that RusKey exhibits strong performance robustness in diverse workloads, achieving up to 4x better end-to-end performance than the RocksDB system under various settings.
- Europe > Austria > Vienna (0.14)
- Asia > Singapore (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
Building a Scalable ML Feature Store with Redis
When a company with millions of consumers such as DoorDash builds machine learning (ML) models, the amount of feature data can grow to billions of records with millions actively retrieved during model inference under low latency constraints. These challenges warrant a deeper look into selection and design of a feature store -- the system responsible for storing and serving feature data. The decisions made here can prevent overrunning cost budgets, compromising runtime performance during model inference, and curbing model deployment velocity. Features are the input variables fed to an ML model for inference. A feature store, simply put, is a key-value store that makes this feature data available to models in production. At DoorDash, our existing feature store was built on top of Redis, but had a lot of inefficiencies and came close to running out of capacity. We ran a full-fledged benchmark evaluation on five different key-value stores to compare their cost and performance metrics.
Interleaved Sequence RNNs for Fraud Detection
Branco, Bernardo, Abreu, Pedro, Gomes, Ana Sofia, Almeida, Mariana S. C., Ascensão, João Tiago, Bizarro, Pedro
Payment card fraud causes multibillion dollar losses for banks and merchants worldwide, often fueling complex criminal activities. To address this, many real-time fraud detection systems use tree-based models, demanding complex feature engineering systems to efficiently enrich transactions with historical data while complying with millisecond-level latencies. In this work, we do not require those expensive features by using recurrent neural networks and treating payments as an interleaved sequence, where the history of each card is an unbounded, irregular sub-sequence. We present a complete RNN framework to detect fraud in real-time, proposing an efficient ML pipeline from preprocessing to deployment. We show that these feature-free, multi-sequence RNNs outperform state-of-the-art models saving millions of dollars in fraud detection and using fewer computational resources.
- North America > United States > California > San Diego County > San Diego (0.05)
- North America > United States > New York > New York County > New York City (0.04)
Towards Better Interpretability in Deep Q-Networks
Annasamy, Raghuram Mandyam, Sycara, Katia
Deep reinforcement learning techniques have demonstrated superior performance in a wide variety of environments. As improvements in training algorithms continue at a brisk pace, theoretical or empirical studies on understanding what these networks seem to learn, are far behind. In this paper we propose an interpretable neural network architecture for Q-learning which provides a global explanation of the model's behavior using key-value memories, attention and reconstructible embeddings. With a directed exploration strategy, our model can reach training rewards comparable to the state-of-the-art deep Q-learning models. However, results suggest that the features extracted by the neural network are extremely shallow and subsequent testing using out-of-sample examples shows that the agent can easily overfit to trajectories seen during training.
Deploying Deep Ranking Models for Search Verticals
Ramanath, Rohan, Polatkan, Gungor, Xu, Liqin, Lee, Harold, Hu, Bo, Zhou, Shan
In this paper, we present an architecture executing a complex machine learning model such as a neural network capturing semantic similarity between a query and a document; and deploy to a real-world production system serving 500M+users. We present the challenges that arise in a real-world system and how we solve them. We demonstrate that our architecture provides competitive modeling capability without any significant performance impact to the system in terms of latency. Our modular solution and insights can be used by other real-world search systems to realize and productionize recent gains in neural networks.
NoSQL Data Architecture & Data Governance: Everything You Need to Know - DATAVERSITY
Click to learn more about author Akshay Pore. NoSQL databases have seen increased adoption in the last five years. A growing number of companies have at least one NoSQL database as a part of their enterprise data landscape. In today's IT environment, where DevOps, SAFe and Agile are being increasingly embraced, utilization of NoSQL database for application development is seen as a huge advantage in speeding up time to market for software products. Developers have been quick to adopt NoSQL databases due to their flexible schema, efficient processing & storage of unstructured & semi-structured data and ability to support high performance queries in a scale out environment.
Top NoSQL Database Engines
I am not a fan of the term NoSQL. Many others are, however, and it has become a permanent part of the collective data storage nomenclature, meant to describe schema-less, non-relational data storage schemes. NoSQL is an umbrella term, one which encompasses a number of different technologies. These different technologies aren't even necessarily related in any way beyond the single defining characteristic of NoSQL: they are not relational in nature; for right or wrong, Structured Query Language (SQL) has become conflated with relational database management systems over the years. So, while I am not personally a fan of the term NoSQL, I can appreciate why others are, given that it quickly implies what it is we are talking about by explicitly stating what we are not talking about.
Model-Parallel Inference for Big Topic Models
Zheng, Xun, Kim, Jin Kyu, Ho, Qirong, Xing, Eric P.
In real world industrial applications of topic modeling, the ability to capture gigantic conceptual space by learning an ultra-high dimensional topical representation, i.e., the so-called "big model", is becoming the next desideratum after enthusiasms on "big data", especially for fine-grained downstream tasks such as online advertising, where good performances are usually achieved by regression-based predictors built on millions if not billions of input features. The conventional data-parallel approach for training gigantic topic models turns out to be rather inefficient in utilizing the power of parallelism, due to the heavy dependency on a centralized image of "model". Big model size also poses another challenge on the storage, where available model size is bounded by the smallest RAM of nodes. To address these issues, we explore another type of parallelism, namely model-parallelism, which enables training of disjoint blocks of a big topic model in parallel. By integrating data-parallelism with model-parallelism, we show that dependencies between distributed elements can be handled seamlessly, achieving not only faster convergence but also an ability to tackle significantly bigger model size. We describe an architecture for model-parallel inference of LDA, and present a variant of collapsed Gibbs sampling algorithm tailored for it. Experimental results demonstrate the ability of this system to handle topic modeling with unprecedented amount of 200 billion model variables only on a low-end cluster with very limited computational resources and bandwidth.
- Asia > Middle East > Jordan (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > Singapore (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)